{
  "cells": [
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/10-langchain-multi-query.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/08-langchain-multi-query.ipynb)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "2-XDGL6Oi6h4"
      },
      "source": [
        "#### [LangChain Handbook](https://pinecone.io/learn/langchain)\n",
        "\n",
        "# LangChain Multi-Query for RAG"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "qi8B1fgywJzE"
      },
      "outputs": [],
      "source": [
        "!pip install -qU \\\n",
        "  pinecone-client==3.1.0 \\\n",
        "  langchain==0.1.1 \\\n",
        "  langchain-community==0.0.13 \\\n",
        "  datasets==2.14.6 \\\n",
        "  openai==1.6.1 \\\n",
        "  tiktoken==0.5.2"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "CPmfrdJ9_2YA"
      },
      "source": [
        "## Getting Data"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "S4Py-rVqx-I0"
      },
      "source": [
        "We will download an existing dataset from Hugging Face Datasets."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "id": "iatOGmKgz8NE"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "Dataset({\n",
              "    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],\n",
              "    num_rows: 41584\n",
              "})"
            ]
          },
          "execution_count": 1,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from datasets import load_dataset\n",
        "\n",
        "data = load_dataset(\"jamescalam/ai-arxiv-chunked\", split=\"train\")\n",
        "data"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "P7E6JYtb0cW7"
      },
      "outputs": [],
      "source": [
        "from langchain.docstore.document import Document\n",
        "\n",
        "docs = []\n",
        "\n",
        "for row in data:\n",
        "    doc = Document(\n",
        "        page_content=row[\"chunk\"],\n",
        "        metadata={\n",
        "            \"title\": row[\"title\"],\n",
        "            \"source\": row[\"source\"],\n",
        "            \"id\": row[\"id\"],\n",
        "            \"chunk-id\": row[\"chunk-id\"],\n",
        "            \"text\": row[\"chunk\"]\n",
        "        }\n",
        "    )\n",
        "    docs.append(doc)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "yb540kEs_6PZ"
      },
      "source": [
        "## Embedding and Vector DB Setup"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "BlKEmBZMBxtd"
      },
      "source": [
        "Initialize our embedding model:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "id": "qZ6vTiDPBznz"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "from getpass import getpass\n",
        "from langchain.embeddings.openai import OpenAIEmbeddings\n",
        "\n",
        "model_name = \"text-embedding-ada-002\"\n",
        "\n",
        "# get openai api key from platform.openai.com\n",
        "OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass(\"OpenAI API Key: \")\n",
        "\n",
        "embed = OpenAIEmbeddings(\n",
        "    model=model_name, openai_api_key=OPENAI_API_KEY, disallowed_special=()\n",
        ")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "IurEkeeI-IYl"
      },
      "source": [
        "Now we create our vector DB to store our vectors. For this we need to get a [free Pinecone API key](https://app.pinecone.io) — the API key can be found in the \"API Keys\" button found in the left navbar of the Pinecone dashboard."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {},
      "outputs": [],
      "source": [
        "from pinecone import Pinecone\n",
        "\n",
        "# initialize connection to pinecone (get API key at app.pinecone.io)\n",
        "api_key = os.getenv(\"PINECONE_API_KEY\") or getpass(\"Enter your Pinecone API key: \")\n",
        "\n",
        "# configure client\n",
        "pc = Pinecone(api_key=api_key)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {},
      "outputs": [],
      "source": [
        "from pinecone import ServerlessSpec\n",
        "\n",
        "spec = ServerlessSpec(\n",
        "    cloud=\"aws\", region=\"us-west-2\"\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Creating an index, we set `dimension` equal to to dimensionality of Ada-002 (`1536`), and use a `metric` also compatible with Ada-002 (this can be either `cosine` or `dotproduct`). We also pass our `spec` to index initialization."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "id": "nL3KFF9E9Qb_"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'dimension': 1536,\n",
              " 'index_fullness': 0.0,\n",
              " 'namespaces': {},\n",
              " 'total_vector_count': 0}"
            ]
          },
          "execution_count": 6,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "import time\n",
        "\n",
        "index_name = \"langchain-multi-query-demo\"\n",
        "existing_indexes = [\n",
        "    index_info[\"name\"] for index_info in pc.list_indexes()\n",
        "]\n",
        "\n",
        "# check if index already exists (it shouldn't if this is first time)\n",
        "if index_name not in existing_indexes:\n",
        "    # if does not exist, create index\n",
        "    pc.create_index(\n",
        "        index_name,\n",
        "        dimension=1536,  # dimensionality of ada 002\n",
        "        metric='dotproduct',\n",
        "        spec=spec\n",
        "    )\n",
        "    # wait for index to be initialized\n",
        "    while not pc.describe_index(index_name).status['ready']:\n",
        "        time.sleep(1)\n",
        "\n",
        "# connect to index\n",
        "index = pc.Index(index_name)\n",
        "time.sleep(1)\n",
        "# view index stats\n",
        "index.describe_index_stats()"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "A3B7dHsd6QcP"
      },
      "source": [
        "Populate our index:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "id": "B7Yi2YGBpTWf"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "41584"
            ]
          },
          "execution_count": 7,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "len(docs)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "id": "thfCYHuSpW4H"
      },
      "outputs": [],
      "source": [
        "# if you want to speed things up to follow along\n",
        "#docs = docs[:5000]"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "id": "HXVVU97C6SwT"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "7ccacf3923234dd5821880d7942218c7",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "  0%|          | 0/416 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "from tqdm.auto import tqdm\n",
        "\n",
        "batch_size = 100\n",
        "\n",
        "for i in tqdm(range(0, len(docs), batch_size)):\n",
        "    i_end = min(len(docs), i+batch_size)\n",
        "    docs_batch = docs[i:i_end]\n",
        "    # get IDs\n",
        "    ids = [f\"{doc.metadata['id']}-{doc.metadata['chunk-id']}\" for doc in docs_batch]\n",
        "    # get text and embed\n",
        "    texts = [d.page_content for d in docs_batch]\n",
        "    embeds = embed.embed_documents(texts=texts)\n",
        "    # get metadata\n",
        "    metadata = [d.metadata for d in docs_batch]\n",
        "    to_upsert = zip(ids, embeds, metadata)\n",
        "    index.upsert(vectors=to_upsert)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "8FbngTBzAAU-"
      },
      "source": [
        "## Multi-Query with LangChain"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "YVVYr13n_Ot2"
      },
      "source": [
        "Now we switch across to using our populated index as a vectorstore in Langchain."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "0ETs0emsAh-K",
        "outputId": "0b1de24b-2f9f-48a6-d8ca-bd3d6aa007e1"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "/Users/jamesbriggs/opt/anaconda3/envs/ml/lib/python3.9/site-packages/langchain_community/vectorstores/pinecone.py:74: UserWarning: Passing in `embedding` as a Callable is deprecated. Please pass in an Embeddings object instead.\n",
            "  warnings.warn(\n"
          ]
        }
      ],
      "source": [
        "from langchain.vectorstores import Pinecone\n",
        "\n",
        "text_field = \"text\"\n",
        "\n",
        "vectorstore = Pinecone(index, embed.embed_query, text_field)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "id": "nW_GCB6a3_N_"
      },
      "outputs": [],
      "source": [
        "from langchain.chat_models import ChatOpenAI\n",
        "\n",
        "llm = ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "1iptBAriANrD"
      },
      "source": [
        "We initialize the `MultiQueryRetriever`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "id": "yYjztBp2ANHC"
      },
      "outputs": [],
      "source": [
        "from langchain.retrievers.multi_query import MultiQueryRetriever\n",
        "\n",
        "retriever = MultiQueryRetriever.from_llm(\n",
        "    retriever=vectorstore.as_retriever(), llm=llm\n",
        ")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "H8qZCd1TAMAn"
      },
      "source": [
        "We set logging so that we can see the queries as they're generated by our LLM."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "id": "rgV1eYU6FgX7"
      },
      "outputs": [],
      "source": [
        "# Set logging for the queries\n",
        "import logging\n",
        "\n",
        "logging.basicConfig()\n",
        "logging.getLogger(\"langchain.retrievers.multi_query\").setLevel(logging.INFO)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "jrjwkpJWAaAn"
      },
      "source": [
        "To query with our multi-query retriever we call the `get_relevant_documents` method."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "_DJ4cSJXFinV",
        "outputId": "265900d1-6aa7-4d28-cbbe-e2e95b7df7b4"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "INFO:langchain.retrievers.multi_query:Generated queries: ['1. Can you provide information about llama 2 and its characteristics?', '2. What can you tell me about llama 2 and its features?', '3. Could you give me an overview of llama 2 and its properties?']\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "6"
            ]
          },
          "execution_count": 14,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "question = \"tell me about llama 2?\"\n",
        "\n",
        "docs = retriever.get_relevant_documents(query=question)\n",
        "len(docs)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "kSu1GsFfAqCd"
      },
      "source": [
        "From this we get a variety of docs retrieved by each of our queries independently. By default the `retriever` is returning `3` docs for each query — totalling `9` documents — however, as there is some overlap we actually return `6` unique docs."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ce5WBh6MFltP",
        "outputId": "f7b06949-e2a6-472e-eaf9-e712dc4bcca2"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "[Document(page_content='Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\\nSergey Edunov Thomas Scialom\\x03\\nGenAI, Meta\\nAbstract\\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\\nmodels outperform open-source chat models on most benchmarks we tested, and based on\\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety', metadata={'chunk-id': '1', 'id': '2307.09288', 'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tuned Chat Models'}),\n",
              " Document(page_content='asChatGPT,BARD,andClaude. TheseclosedproductLLMsareheavilyﬁne-tunedtoalignwithhuman\\npreferences, which greatly enhances their usability and safety. This step can require signiﬁcant costs in\\ncomputeandhumanannotation,andisoftennottransparentoreasilyreproducible,limitingprogresswithin\\nthe community to advance AI alignment research.\\nIn this work, we develop and release Llama 2, a family of pretrained and ﬁne-tuned LLMs, L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle and\\nL/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , at scales up to 70B parameters. On the series of helpfulness and safety benchmarks we tested,\\nL/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc models generally perform better than existing open-source models. They also appear to\\nbe on par with some of the closed-source models, at least on the human evaluations we performed (see', metadata={'chunk-id': '9', 'id': '2307.09288', 'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tuned Chat Models'}),\n",
              " Document(page_content='Q:Yes or no: Could a llama birth twice during War in Vietnam (1945-46)?\\nA:TheWar inVietnam was6months. Thegestationperiod forallama is11months, which ismore than 6\\nmonths. Thus, allama could notgive birth twice duringtheWar inVietnam. So the answer is no.\\nQ:Yes or no: Would a pear sink in water?\\nA:Thedensityofapear isabout 0:6g=cm3,which islessthan water.Objects lessdense than waterﬂoat. Thus,\\napear would ﬂoat. So the answer is no.\\nTable 26: Few-shot exemplars for full chain of thought prompt for Date Understanding.\\nPROMPT FOR DATE UNDERSTANDING\\nQ:2015 is coming in 36 hours. What is the date one week from today in MM/DD/YYYY?\\nA:If2015 iscomingin36hours, then itiscomingin2days. 2days before01/01/2015 is12/30/2014, sotoday\\nis12/30/2014. Sooneweek from todaywillbe01/05/2015. So the answer is 01/05/2015.', metadata={'chunk-id': '137', 'id': '2201.11903', 'source': 'http://arxiv.org/pdf/2201.11903', 'title': 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models'}),\n",
              " Document(page_content='Dubois, Xuechen Li, Carlos Guestrin, Percy Liang,\\nand Tatsunori B. Hashimoto. 2023. Stanford alpaca:\\nAn instruction-following llama model. https://\\ngithub.com/tatsu-lab/stanford_alpaca .\\nRomal Thoppilan, Daniel De Freitas, Jamie Hall, Noam\\nShazeer, Apoorv Kulshreshtha, Heng-Tze Cheng,\\nAlicia Jin, Taylor Bos, Leslie Baker, Yu Du, et al.\\n2022. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239 .\\nHugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier\\nMartinet, Marie-Anne Lachaux, Timothée Lacroix,\\nBaptiste Rozière, Naman Goyal, Eric Hambro, Faisal\\nAzhar, Aurelien Rodriguez, Armand Joulin, Edouard\\nGrave, and Guillaume Lample. 2023. Llama: Open\\nand efficient foundation language models. arXiv\\npreprint arXiv:2302.13971 .', metadata={'chunk-id': '37', 'id': '2304.01196', 'source': 'http://arxiv.org/pdf/2304.01196', 'title': 'Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data'}),\n",
              " Document(page_content=\"user asks weather, which needs more specific and updated information. I should\\nfollow rules 3 (helpful), 7 (candor), 9 (static).\\nWatson: As an AI language model, I have no access to weather sensors or real-time\\nweather data. However, I can recommend you to:\\n1. check your local weather forecast on a weather website or app,\\n2. or by watching the news or checking your local weather station.\\nUser: Good job! Clear context\\nWatson (auto reply): Thank you! For further questions or guidance, just reach out.\\nUser: Tell me about alpaca 's family\\nWatson (internal thoughts): I am a helpful, ethical, and reliable assistant. The\\nuser asks alpaca, which is a public entity in one of my internal knowledge\\nbases: Wikipedia. I should follow rules 2 (informative), 6 (multi-aspect), 8 (\\nknowledge recitation).\\nWatson: According to my knowledge from Wikipedia, here is what I know about alpaca 's\\nfamily:\\nThe alpaca (Lama pacos) is a species of South American camelid mammal. It is similar\\nto, and often confused with, the llama. However, alpacas are often noticeably\\nsmaller than llamas.\\nRegarding their family, alpacas belong to the Camelidae family, which includes other\", metadata={'chunk-id': '112', 'id': '2305.03047', 'source': 'http://arxiv.org/pdf/2305.03047', 'title': 'Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision'}),\n",
              " Document(page_content='models will be released as we improve model safety with community feedback.\\nLicense A custom commercial license is available at: ai.meta.com/resources/\\nmodels-and-libraries/llama-downloads/\\nWhere to send commentsInstructions on how to provide feedback or comments on the model can be\\nfound in the model README, or by opening an issue in the GitHub repository\\n(https://github.com/facebookresearch/llama/ ).\\nIntended Use\\nIntended Use Cases L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle is intended for commercial and research use in English. Tuned models\\nare intended for assistant-like chat, whereas pretrained models can be adapted\\nfor a variety of natural language generation tasks.\\nOut-of-Scope Uses Use in any manner that violates applicable laws or regulations (including trade\\ncompliancelaws). UseinlanguagesotherthanEnglish. Useinanyotherway\\nthat is prohibited by the Acceptable Use Policy and Licensing Agreement for\\nL/l.sc/a.sc/m.sc/a.sc /two.taboldstyle.\\nHardware and Software (Section 2.2)\\nTraining Factors We usedcustomtraininglibraries, Meta’sResearchSuperCluster, andproductionclustersforpretraining. Fine-tuning,annotation,andevaluationwerealso', metadata={'chunk-id': '317', 'id': '2307.09288', 'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tuned Chat Models'})]"
            ]
          },
          "execution_count": 15,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "docs"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "KLMwfZPfBF89"
      },
      "source": [
        "## Adding the Generation in RAG"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "X79eNNL_BM4G"
      },
      "source": [
        "So far we've built a multi-query powered **R**etrieval **A**ugmentation chain. Now, we need to add **G**eneration."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "id": "jNnXYOtqypiz"
      },
      "outputs": [],
      "source": [
        "from langchain.prompts import PromptTemplate\n",
        "from langchain.chains import LLMChain\n",
        "\n",
        "QA_PROMPT = PromptTemplate(\n",
        "    input_variables=[\"query\", \"contexts\"],\n",
        "    template=\"\"\"You are a helpful assistant who answers user queries using the\n",
        "    contexts provided. If the question cannot be answered using the information\n",
        "    provided say \"I don't know\".\n",
        "\n",
        "    Contexts:\n",
        "    {contexts}\n",
        "\n",
        "    Question: {query}\"\"\",\n",
        ")\n",
        "\n",
        "# Chain\n",
        "qa_chain = LLMChain(llm=llm, prompt=QA_PROMPT)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 123
        },
        "id": "h6GVEZkhytdM",
        "outputId": "f03086b8-8d30-4d6e-a723-833ffecbcf8e"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "'Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. The fine-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc, are optimized for dialogue use cases. These models outperform open-source chat models on most benchmarks and are considered a suitable substitute for closed-source models based on humane evaluations for helpfulness and safety. The approach to fine-tuning and safety is described in detail.'"
            ]
          },
          "execution_count": 17,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "out = qa_chain(\n",
        "    inputs={\n",
        "        \"query\": question,\n",
        "        \"contexts\": \"\\n---\\n\".join([d.page_content for d in docs])\n",
        "    }\n",
        ")\n",
        "out[\"text\"]"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "KemgDCg8DkgE"
      },
      "source": [
        "## Chaining Everything with a SequentialChain"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "kTbLlWgEEII1"
      },
      "source": [
        "We can pull together the logic above into a function or set of methods, whatever is prefered — however if we'd like to use LangChain's approach to this we must \"chain\" together multiple chains. The first retrieval component is (1) not a chain per se, and (2) requires processing of the output. To do that, and fit with LangChain's \"chaining chains\" approach, we setup the _retrieval_ component within a `TransformChain`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {
        "id": "BpFmiRtYDpHp"
      },
      "outputs": [],
      "source": [
        "from langchain.chains import TransformChain\n",
        "\n",
        "def retrieval_transform(inputs: dict) -> dict:\n",
        "    docs = retriever.get_relevant_documents(query=inputs[\"question\"])\n",
        "    docs = [d.page_content for d in docs]\n",
        "    docs_dict = {\n",
        "        \"query\": inputs[\"question\"],\n",
        "        \"contexts\": \"\\n---\\n\".join(docs)\n",
        "    }\n",
        "    return docs_dict\n",
        "\n",
        "retrieval_chain = TransformChain(\n",
        "    input_variables=[\"question\"],\n",
        "    output_variables=[\"query\", \"contexts\"],\n",
        "    transform=retrieval_transform\n",
        ")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "SoD45Au1Eg-r"
      },
      "source": [
        "Now we chain this with our generation step using the `SequentialChain`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "metadata": {
        "id": "azqCwDwXEkDT"
      },
      "outputs": [],
      "source": [
        "from langchain.chains import SequentialChain\n",
        "\n",
        "rag_chain = SequentialChain(\n",
        "    chains=[retrieval_chain, qa_chain],\n",
        "    input_variables=[\"question\"],  # we need to name differently to output \"query\"\n",
        "    output_variables=[\"query\", \"contexts\", \"text\"]\n",
        ")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "xpB2aWV4ESzf"
      },
      "source": [
        "Then we perform the full RAG pipeline:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 161
        },
        "id": "JvJbUaLqFRG2",
        "outputId": "582caa21-777a-4a01-a618-9db64185ad5e"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "INFO:langchain.retrievers.multi_query:Generated queries: ['1. What information can you provide about llama 2?', '2. Could you give me some details about llama 2?', '3. I would like to learn more about llama 2. Can you help me with that?']\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "'Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. These LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc, are optimized for dialogue use cases. They have been shown to outperform open-source chat models on most benchmarks and are considered a suitable substitute for closed-source models based on humane evaluations for helpfulness and safety. The approach to fine-tuning and safety is described in detail in the work.'"
            ]
          },
          "execution_count": 20,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "out = rag_chain({\"question\": question})\n",
        "out[\"text\"]"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "bLmv01geK-ZS"
      },
      "source": [
        "---"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "vAZVPhHzLDQQ"
      },
      "source": [
        "## Custom Multiquery"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "rI-KVO6zjJZw"
      },
      "source": [
        "We'll try this with two prompts, both encourage more variety in search queries.\n",
        "\n",
        "**Prompt A**\n",
        "```\n",
        "Your task is to generate 3 different search queries that aim to\n",
        "answer the user question from multiple perspectives.\n",
        "Each query MUST tackle the question from a different viewpoint,\n",
        "we want to get a variety of RELEVANT search results.\n",
        "Provide these alternative questions separated by newlines.\n",
        "Original question: {question}\n",
        "```\n",
        "\n",
        "\n",
        "**Prompt B**\n",
        "```\n",
        "Your task is to generate 3 different search queries that aim to\n",
        "answer the user question from multiple perspectives. The user questions\n",
        "are focused on Large Language Models, Machine Learning, and related\n",
        "disciplines.\n",
        "Each query MUST tackle the question from a different viewpoint, we\n",
        "want to get a variety of RELEVANT search results.\n",
        "Provide these alternative questions separated by newlines.\n",
        "Original question: {question}\n",
        "```"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 21,
      "metadata": {
        "id": "4IlEnYeKLFzh"
      },
      "outputs": [],
      "source": [
        "from typing import List\n",
        "from langchain.chains import LLMChain\n",
        "from pydantic import BaseModel, Field\n",
        "from langchain.prompts import PromptTemplate\n",
        "from langchain.output_parsers import PydanticOutputParser\n",
        "\n",
        "\n",
        "# Output parser will split the LLM result into a list of queries\n",
        "class LineList(BaseModel):\n",
        "    # \"lines\" is the key (attribute name) of the parsed output\n",
        "    lines: List[str] = Field(description=\"Lines of text\")\n",
        "\n",
        "\n",
        "class LineListOutputParser(PydanticOutputParser):\n",
        "    def __init__(self) -> None:\n",
        "        super().__init__(pydantic_object=LineList)\n",
        "\n",
        "    def parse(self, text: str) -> LineList:\n",
        "        lines = text.strip().split(\"\\n\")\n",
        "        return LineList(lines=lines)\n",
        "\n",
        "\n",
        "output_parser = LineListOutputParser()\n",
        "\n",
        "template = \"\"\"\n",
        "Your task is to generate 3 different search queries that aim to\n",
        "answer the user question from multiple perspectives. The user questions\n",
        "are focused on Large Language Models, Machine Learning, and related\n",
        "disciplines.\n",
        "Each query MUST tackle the question from a different viewpoint, we\n",
        "want to get a variety of RELEVANT search results.\n",
        "Provide these alternative questions separated by newlines.\n",
        "Original question: {question}\n",
        "\"\"\"\n",
        "\n",
        "QUERY_PROMPT = PromptTemplate(\n",
        "    input_variables=[\"question\"],\n",
        "    template=template,\n",
        ")\n",
        "llm = ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)\n",
        "\n",
        "# Chain\n",
        "llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT, output_parser=output_parser)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 22,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "0CgduNJWLBez",
        "outputId": "7ffee6c2-27b4-4bdf-8c79-7effd27e3cd4"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "INFO:langchain.retrievers.multi_query:Generated queries: ['1. What are the key features and capabilities of Large Language Model Llama 2?', '2. How does Llama 2 compare to other Large Language Models in terms of performance and efficiency?', '3. What are the applications and use cases of Llama 2 in the field of Machine Learning and Natural Language Processing?']\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "7"
            ]
          },
          "execution_count": 22,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# Run\n",
        "retriever = MultiQueryRetriever(\n",
        "    retriever=vectorstore.as_retriever(), llm_chain=llm_chain, parser_key=\"lines\"\n",
        ")  # \"lines\" is the key (attribute name) of the parsed output\n",
        "\n",
        "# Results\n",
        "docs = retriever.get_relevant_documents(\n",
        "    query=question\n",
        ")\n",
        "len(docs)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "PSySsaDKMK1i",
        "outputId": "e6f95abd-99fc-4576-d1f4-5fd4c21c70ab"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "[Document(page_content='Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\\nSergey Edunov Thomas Scialom\\x03\\nGenAI, Meta\\nAbstract\\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\\nmodels outperform open-source chat models on most benchmarks we tested, and based on\\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety', metadata={'chunk-id': '1', 'id': '2307.09288', 'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tuned Chat Models'}),\n",
              " Document(page_content='2\\n3.4.3 Even programmatic measures of model capability can be highly subjective . . . . . . . 15\\n3.5 Even large language models are brittle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15\\n3.6 Social bias in large language models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17\\n3.7 Performance on non-English languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20\\n4 Behavior on selected tasks 21\\n4.1 Checkmate-in-one task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22\\n4.2 Periodic elements task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23\\n5 Additional related work 24\\n6 Discussion 25', metadata={'chunk-id': '14', 'id': '2206.04615', 'source': 'http://arxiv.org/pdf/2206.04615', 'title': 'Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models'}),\n",
              " Document(page_content='challenges described above) about how the development of large language models has unfolded thus far, including a\\nquantitative analysis of the increasing gap between academia and industry for large model development.\\nFinally, in Section 4 we outline policy interventions that may help concretely address the challenges we outline in\\nSections 2 and 3 in order to help guide the development and deployment of larger models for the broader social good.\\nWe leave some illustrative experiments, technical details, and caveats about our claims in Appendix A.\\n2 DISTINGUISHING FEATURES OF LARGE GENERATIVE MODELS\\nWe claim that large generative models (e.g., GPT-3 [ 11], LaMDA [ 78], Gopher [ 62], etc.) are distinguished by four\\nfeatures:\\n•Smooth, general capability scaling : It is possible to predictably improve the general performance of generative\\nmodels — their loss on capturing a specific, though very broad, data distribution — by scaling up the size of the\\nmodels, the compute used to train them, and the amount of data they’re trained on in the correct proportions.\\nThese proportions can be accurately predicted by scaling laws (Figure 1). We believe that these scaling laws\\nde-risk investments in building larger and generally more capable models despite the high resource costs and the\\ndifficulty of predicting precisely how well a model will perform on a specific task. Note, the harmful properties', metadata={'chunk-id': '9', 'id': '2202.07785', 'source': 'http://arxiv.org/pdf/2202.07785', 'title': 'Predictability and Surprise in Large Generative Models'}),\n",
              " Document(page_content='asChatGPT,BARD,andClaude. TheseclosedproductLLMsareheavilyﬁne-tunedtoalignwithhuman\\npreferences, which greatly enhances their usability and safety. This step can require signiﬁcant costs in\\ncomputeandhumanannotation,andisoftennottransparentoreasilyreproducible,limitingprogresswithin\\nthe community to advance AI alignment research.\\nIn this work, we develop and release Llama 2, a family of pretrained and ﬁne-tuned LLMs, L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle and\\nL/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , at scales up to 70B parameters. On the series of helpfulness and safety benchmarks we tested,\\nL/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc models generally perform better than existing open-source models. They also appear to\\nbe on par with some of the closed-source models, at least on the human evaluations we performed (see', metadata={'chunk-id': '9', 'id': '2307.09288', 'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tuned Chat Models'}),\n",
              " Document(page_content='but BoolQ. Similarly, this model surpasses PaLM540B everywhere but on BoolQ and WinoGrande.\\nLLaMA-13B model also outperforms GPT-3 on\\nmost benchmarks despite being 10 \\x02smaller.\\n3.2 Closed-book Question Answering\\nWe compare LLaMA to existing large language\\nmodels on two closed-book question answering\\nbenchmarks: Natural Questions (Kwiatkowski\\net al., 2019) and TriviaQA (Joshi et al., 2017). For\\nboth benchmarks, we report exact match performance in a closed book setting, i.e., where the models do not have access to documents that contain\\nevidence to answer the question. In Table 4, we\\nreport performance on NaturalQuestions, and in Table 5, we report on TriviaQA. On both benchmarks,\\nLLaMA-65B achieve state-of-the-arts performance\\nin the zero-shot and few-shot settings. More importantly, the LLaMA-13B is also competitive on\\nthese benchmarks with GPT-3 and Chinchilla, despite being 5-10 \\x02smaller. This model runs on a\\nsingle V100 GPU during inference.\\n0-shot 1-shot 5-shot 64-shot\\nGopher 280B 43.5 - 57.0 57.2', metadata={'chunk-id': '17', 'id': '2302.13971', 'source': 'http://arxiv.org/pdf/2302.13971', 'title': 'LLaMA: Open and Efficient Foundation Language Models'}),\n",
              " Document(page_content='5 Discussion 19\\n6 Conclusion 21\\n1 Introduction: motivation for the survey and deﬁnitions\\n1.1 Motivation\\nLarge Language Models (LLMs) ( Devlin et al. ,2019;Brown et al. ,2020;Chowdhery et al. ,2022) have fueled dramatic progress in Natural Language Processing (NLP ) and are already core in several products with\\nmillions of users, such as the coding assistant Copilot ( Chen et al. ,2021), Google search engine1or more recently ChatGPT2. Memorization ( Tirumala et al. ,2022) combined with compositionality ( Zhou et al. ,2022)\\ncapabilities made LLMs able to execute various tasks such as language understanding or conditional and unconditional text generation at an unprecedented level of pe rformance, thus opening a realistic path towards\\nhigher-bandwidth human-computer interactions.\\nHowever, LLMs suﬀer from important limitations hindering a broader deployment. LLMs often provide nonfactual but seemingly plausible predictions, often referr ed to as hallucinations ( Welleck et al. ,2020). This\\nleads to many avoidable mistakes, for example in the context of arithmetics ( Qian et al. ,2022) or within\\na reasoning chain ( Wei et al. ,2022c ). Moreover, many LLMs groundbreaking capabilities seem to emerge', metadata={'chunk-id': '5', 'id': '2302.07842', 'source': 'http://arxiv.org/pdf/2302.07842', 'title': 'Augmented Language Models: a Survey'}),\n",
              " Document(page_content='practicable options for academic research since they were acquired by Appen, a company that is\\nfocused on a business market.\\nThis paper explores the potential of large language models (LLMs) for text annotation tasks, with a\\nfocus on ChatGPT, which was released in November 2022. It demonstrates that zero-shot ChatGPT\\nclassiﬁcations (that is, without any additional training) outperform MTurk annotations, at a fraction\\nof the cost. LLMs have been shown to perform very well for a wide range of purposes, including\\nideological scaling (Wu et al., 2023), the classiﬁcation of legislative proposals (Nay, 2023), the\\nresolution of cognitive psychology tasks (Binz and Schulz, 2023), and the simulation of human\\nsamples for survey research (Argyle et al., 2023). While a few studies suggested that ChatGPT\\nmight perform text annotation tasks of the kinds we have described (Kuzman, Mozeti ˇc and Ljubeši ´c,\\n2023; Huang, Kwak and An, 2023), to the best of our knowledge our work is the ﬁrst systematic\\nevaluation. Our analysis relies on a sample of 6,183 documents, including tweets and news articles', metadata={'chunk-id': '3', 'id': '2303.15056', 'source': 'http://arxiv.org/pdf/2303.15056', 'title': 'ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks'})]"
            ]
          },
          "execution_count": 23,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "docs"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "F4q65OEiizU2"
      },
      "source": [
        "Putting this together in another `SequentialChain`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 24,
      "metadata": {
        "id": "LTRjTIKzi2-g"
      },
      "outputs": [],
      "source": [
        "retrieval_chain = TransformChain(\n",
        "    input_variables=[\"question\"],\n",
        "    output_variables=[\"query\", \"contexts\"],\n",
        "    transform=retrieval_transform\n",
        ")\n",
        "\n",
        "rag_chain = SequentialChain(\n",
        "    chains=[retrieval_chain, qa_chain],\n",
        "    input_variables=[\"question\"],  # we need to name differently to output \"query\"\n",
        "    output_variables=[\"query\", \"contexts\", \"text\"]\n",
        ")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "Rda74xhpjE6A"
      },
      "source": [
        "And asking again:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 25,
      "metadata": {
        "id": "9UcBY71cjGgX"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "INFO:langchain.retrievers.multi_query:Generated queries: ['1. What are the key features and capabilities of Large Language Model Llama 2?', '2. How does Llama 2 compare to other Large Language Models in terms of performance and efficiency?', '3. What are the applications and use cases of Llama 2 in the field of Machine Learning and Natural Language Processing?']\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "'Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. These models, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc, are optimized for dialogue use cases and have been shown to outperform open-source chat models on most benchmarks. They are considered as a suitable substitute for closed-source models in terms of helpfulness and safety. The development of Llama 2 addresses challenges such as programmatic measures of model capability, brittleness of large language models, social bias, and performance on non-English languages.'"
            ]
          },
          "execution_count": 25,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "out = rag_chain({\"question\": question})\n",
        "out[\"text\"]"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "8jULksgk7gLA"
      },
      "source": [
        "After finishing, delete your Pinecone index to save resources:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 26,
      "metadata": {},
      "outputs": [],
      "source": [
        "pc.delete_index(index_name)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "---"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
